Optimization of the design of a synchronous digital circuit

ABSTRACT

The design of a synchronous digital circuit ( 1 ) can be modified. The circuit comprises a number of clocked storage devices ( 2, 3, 4, 5 ,) and a number of combinational logic elements defining combinational paths ( 6, 7, 8, 9 ,) between at least some of said clocked storage devices. Each combinational path from an output of one clocked storage device to an input of another has a minimum delay value (D min ) and a maximum delay value (D max ). The actual delay of the path assumes a value between the minimum and maximum delay values. The method comprises the steps of identifying the path ( 6; 7; 8; 9 ) having the greatest difference between the maximum delay value (D max ) and the minimum delay value (D MIN ), and reducing said difference by increasing the minimum delay value for the path having the greatest difference. With the method a higher clock frequency for the circuit can be achieved.

This application is a national stage application of PCT InternationalApplication No. PCT/EP02/10750, filed 26 Sep. 2002, which claimspriority from European Patent Application No. 01610111.5 filed 29 Oct.2001 and from U.S. Provisional Application No. 60/330,856 filed 1 Nov.2001.

TECHNICAL FIELD OF THE INVENTION

The invention relates to a method of modifying the design of asynchronous digital circuit comprising a number of clocked storagedevices and a number of combinational logic elements definingcombinational paths between at least some of said clocked storagedevices, each combinational path from an output of a first one of saidclocked storage devices to an input of a second one of said clockedstorage devices having a minimum delay value and a maximum delay value,such that the actual delay of said path assumes a value between theminimum delay value and the maximum delay value. The invention furtherrelates to a system for modifying the design of such a circuit, and to acomputer readable medium having instructions for causing a processingunit to execute the method.

DESCRIPTION OF RELATED ART

In digital synchronous circuits clock signals are used to synchronizecomputations. Digital signals are stored in storage elements awaiting asynchronizing clock pulse. The storage elements are typicallyinterconnected by combinational logic. Each storage element delays thesignal by a single clock period. Synchronizing the storage elements withclock signals reduces the uncertainty in delay between sending andreceiving signals in the storage elements. The storage elements, such asregisters, latches and flip-flops, sample output signals of thecombinational logic, preserve the values internally as the state of thecircuit, and make the state available for new computations after acertain delay.

Pushing the frequency of the clock signal of a digital synchronouscircuit towards higher frequencies to obtain a higher rate ofcalculation in the logic has been, is still, and will most likelycontinue to be one of the most important optimization objectives in thedesign of digital synchronous circuits.

Most current schemes for optimizing the maximal clock frequency ofdigital circuits are focused on circuits with so-called zero-skew orminimal skew clock distribution. This zero-skew or minimal skew clockdistribution is based on distributing clock signals to storage elementsconcentrating on ensuring a high degree of synchronism of all clocksignals. The clock signals are typically distributed in a tree-likestructure, whereby delays in different branches can be balanced to ahigh degree. The major benefit of such schemes is that uniformity bringspredictability and simplifies the overall design problem. Zero-skew orminimal skew clock distribution is e.g. known from U.S. Pat. No.5,122,679, U.S. Pat. No. 5,852,640 and U.S. Pat. No. 6,025,740. However,the performance of this kind of circuit is limited by the longestcombinatorial delay among local paths between any pair of storageelements.

Alternatives to the zero- and minimal-skew clock distribution schemeexist, but are less frequently used. In unidirectional pipelines it iscommon practice to distribute the clock signal in the direction oppositeto the data flow. However, complex ASIC designs are rarely suitable forthis method, since their data flow is complex and irregular.

Performance tuning through intentional clock skew is also used, eitherthrough explicit designer decisions to redistribute computation timebetween two pipeline stages, or through the use of special CAD tools,such as the tool “ClockWise” offered by Ultima InterconnectTechnologies. The theoretical limit for the performance of anintentional clock skew scheme should be defined by the mean value of thelongest delays in the loop having the highest mean value of the longestdelays. However, practice has shown that the highest obtainable clockfrequency is considerably lower than the theoretical limit, because itis also limited by other factors.

The intentional clock skew scheme is also used in combination with othermethods. H. Sathyamurthy et al, “Speeding up Pipelined Circuits througha Combination of Gate Sizing and Clock Skew Optimization” describes analgorithm in which manipulation of clock skew is combined with gatesizing, i.e. reduction of the delay of e.g. a gate by changing thedimensions of the transistors of the gate. However, gate sizing impliesan increased circuit area and a higher power dissipation of the circuit.T. Soyata et al, “Integration of Clock Skew and Register Delays into aRetiming Algorithm” (0-7803-1254-5/93), IEEE, 1993 combines the use ofclock skew with a retiming process in which registers of a synchronouscircuit are relocated within the circuit in order to achieve a higherclock frequency. This relocation of registers is a very complex processfor complicated circuits, because the relocation of a register typicallyrequires the use of several new registers to replace the one that wasrelocated.

Therefore, it is an object of the invention to provide a method of theabove-mentioned type in which the clock frequency of a synchronousdigital circuit can be increased in a relatively simple way without theuse of the very complex or power consuming methods mentioned above.

SUMMARY

According to the invention the object is achieved in that the methodcomprises the steps of identifying the combinational path having thelargest difference between the maximum delay value and the minimum delayvalue, and reducing said difference between the maximum delay value andthe minimum delay value by increasing the minimum delay value for saidcombinational path having the largest difference.

The minimum delay value of a combinational path can often be increasedeasily, and thus this is a very simple way of reducing the differencebetween the maximum delay value and the minimum delay value. Since thegreatest one of these difference values can be shown to be the lowerlimit for the usable clock period, a reduction will allow a shorterclock period and thus a higher clock frequency of the circuit.

When the greatest difference in case of parallel paths is calculated asthe difference between the highest maximum delay value and the lowestminimum delay value, it is ensured that the situation where one path hasthe lowest minimum delay value and another the highest maximum delayvalue is also taken into account. When further the maximum delay valuefor a sequential path is calculated as the sum of the maximum delayvalues for the paths comprised in the sequential path, and the minimumdelay value for a sequential path is calculated as the sum of theminimum delay values for the paths comprised in the sequential path,also this situation can be taken into account.

When the step of increasing the minimum delay value for a combinationalpath is performed by inserting a number of buffers in the combinationalpath, a very simple and cost effective method is achieved.

When the method further comprises the steps of identifying amongsequential paths from an input to an output of the circuit andsequential paths defining loops in the circuit the sequential pathhaving the highest mean value of the maximum delay values, calculatingsaid highest mean value of the maximum delay values, identifying thosepaths for which the difference between the maximum delay value and theminimum delay value exceeds said highest mean value of the maximum delayvalues, and reducing said differences exceeding the highest mean valueof the maximum delay values to be less than or equal to said highestmean value of the maximum delay values, it is possible to designcircuits that can be clocked with the highest possible clock frequency,because the highest mean value of the maximum delay values is the lowerlimit for the clock period for a circuit where the input and the outputshould be clocked simultaneously, or for circuits in which loops occurdue to feed-back-couplings.

As mentioned, the invention also relates to a system for modifying thedesign of a synchronous digital circuit comprising a number of clockedstorage devices and a number of combinational logic elements definingcombinational paths between at least some of said clocked storagedevices, each combinational path from an output of a first one of saidclocked storage devices to an input of a second one of said clockedstorage devices having a minimum delay value and a maximum delay value,such that the actual delay of said path assumes a value between theminimum delay value and the maximum delay value.

When the system comprises means for identifying the combinational pathhaving the greatest difference between the maximum delay value and theminimum delay value, and means for reducing said difference between themaximum delay value and the minimum delay value by increasing theminimum delay value for said combinational path having the largestdifference, the system will be able to increase the clock frequency of asynchronous digital circuit in a relatively simple way without the useof the very complex or power consuming methods mentioned above. Theminimum delay value of a combinational path can often be increasedeasily, and thus this is a very simple way of reducing the differencebetween the maximum delay value and the minimum delay value. Since thegreatest one of these difference values can be shown to be the lowerlimit for the usable clock period, a reduction will allow a shorterclock period and thus a higher clock frequency of the circuit.

When the system is adapted to calculate the largest difference in caseof parallel paths as the difference between the highest maximum delayvalue and the lowest minimum delay value, it is ensured that thesituation where one path has the lowest minimum delay value and anotherthe highest maximum delay value is also taken into account. When thesystem is further adapted to calculate the maximum delay value for asequential path as the sum of the maximum delay values for the pathscomprised in the sequential path, and to calculate the minimum delayvalue for a sequential path as the sum of the minimum delay values forthe paths comprised in the sequential path, also this situation can betaken into account.

When the system is adapted to increase the minimum delay value for acombinational path by the insertion of a number of buffers in thecombinational path, a simple and cost effective system is achieved.

When the system further comprises means for identifying among sequentialpaths from an input to an output of the circuit and sequential pathsdefining loops in the circuit the sequential path having the highestmean value of the maximum delay values, means for calculating saidhighest mean value of the maximum delay values, means for identifyingthose paths for which the difference between the maximum delay value andthe minimum delay value exceeds said highest mean value of the maximumdelay values, and means for reducing said differences exceeding thehighest mean value of the maximum delay values to be less than or equalto said highest mean value of the maximum delay values, it is possibleto design circuits that can be clocked with the highest possible clockfrequency, because the highest mean value of the maximum delay values isthe lower limit for the clock period for a circuit where the input andthe output should be clocked simultaneously, or for circuits in whichloops occur due to feed-back couplings.

As mentioned, the invention further relates to a computer readablemedium having stored therein instructions for causing a processing unitto execute the above method. With this medium a system as describedabove can be implemented on a normal computer.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will now be described more fully below with reference tothe drawings, in which

FIG. 1 shows a synchronous digital circuit in which the invention can beapplied,

FIG. 2 shows an example of a combinational circuit that can be used inthe circuit of FIG. 1,

FIG. 3 shows the circuit of FIG. 1 with examples of specific values ofthe delays,

FIG. 4 shows a timing diagram for the circuit of FIG. 3,

FIG. 5 shows an alternative timing diagram for the circuit of FIG. 3,

FIG. 6 shows the circuit of FIG. 3 modified with intentional clock skew,

FIG. 7 shows a timing diagram for the circuit of FIG. 6,

FIG. 8 shows a table of difference values calculated for the circuit ofFIG. 6,

FIG. 9 shows the combinational circuit of FIG. 2 modified to have alonger minimum delay value,

FIG. 10 shows a table of difference values calculated with extendedminimum delay values,

FIG. 11 shows the circuit of FIG. 6 modified according to the table ofFIG. 10,

FIG. 12 shows a timing diagram for the circuit of FIG. 11,

FIG. 13 shows an alternative table of difference values calculated withextended minimum delay values,

FIG. 14 shows the circuit of FIG. 6 modified according to the table ofFIG. 13,

FIG. 15 shows a timing diagram for the circuit of FIG. 14,

FIG. 16 shows the circuit of FIG. 1 with alternative examples ofspecific values of the delays,

FIG. 17 shows a timing diagram for the circuit of FIG. 16,

FIG. 18 shows a table of difference values calculated for the circuit ofFIG. 16,

FIG. 19 shows a timing diagram for the circuit of FIG. 16 modified withintentional clock skew,

FIG. 20 shows a table of difference values for the circuit of FIG. 16calculated with extended minimum delay value for one path,

FIG. 21 shows a timing diagram for the circuit of FIG. 16 modifiedaccording to the table of FIG. 20,

FIG. 22 shows a table of difference values for the circuit of FIG. 16calculated with extended minimum delay values for two paths,

FIG. 23 shows the circuit of FIG. 16 modified according to the table ofFIG. 22, and

FIG. 24 shows a timing diagram for the circuit of FIG. 23.

DETAILED DESCRIPTION OF EMBODIMENTS

FIG. 1 illustrates an example of a synchronous digital circuit 1 havingfour registers 2, 3, 4, and 5 and four blocks of combinational logic 6,7, 8 and 9. The registers 2, 3, 4, and 5 are also designated FF_(a),FF_(b), FF_(c) and FF_(d), and they are clocked from a clock source 10.The clock is subjected to certain insertion delays δ_(a), δ_(b), δ_(c)and δ_(d), indicated by the delay blocks 11, 12, 13 and 14, as it isdistributed to the registers.

Each of the combinational logic blocks 6, 7, 8 and 9 delays the digitalsignals passing through them. The delay of a combinational logic block,i.e. the delay from the output of one register to the input of anotherregister, may vary between a shortest combinational delay D_(min) and alongest combinational delay D_(max). Thus the delay of e.g. the block 6,i.e. the delay from the output of register FF_(a) to the input ofregister FF_(b), may vary between a shortest combinational delayD_(min[a,b]) and a longest combinational delay D_(max[a,b]).

Very often D_(min) is considerably smaller than D_(max) which isillustrated by the combinational circuit 15 shown in FIG. 2 which couldbe any of the combinational logic blocks 6, 7, 8 and 9 in FIG. 1. Thecircuit 15 is connected between the output of a register 16 and theinput of another register 17, and it is further connected to twoexternal signals 18 and 19. These signals may be asynchronous signals orsynchronous signals. However, we assume that they are stable when achange propagates through the circuit. The circuit 15 comprises the ANDgates 20, 21 and 22 and an inverter 23. Each gate and inverter issupposed to have a delay of one nanosecond. It is seen that if theoutput from register 16 e.g. changes from a “1” to a “0” in a situationwhere the output of the circuit 15 is a “1”, the output will change to a“0” independent of the rest of the circuit already after 1 ns becausethe signal only has to propagate through gate 22. Thus D_(min[16,17]) is1 ns. In other situations, however, the output signal will also dependon the external signals 18 and 19, the gates 20, 21 and the inverter 23,and it will have to propagate through all gates/inverters before theoutput is ready. This will take 4 ns and thus D_(max[16,17]) is 4 ns.

FIG. 3 shows the circuit of FIG. 1 with specific values of the longestand shortest combinational delays and of the insertion delays of theclock signals. It is seen that all four insertion delays are set to 10ns corresponding to a traditional zero-skew clocking scheme. FIG. 4illustrates how the timing of the circuit could be.

It should be noted that for reasons of simplicity the time required forthe data at the input of a register to latch, i.e. the set-up time, andthe time required for the data to appear at the output of the registerupon arrival of the clock signal are not taken into account. The same istrue for the hold time of the register. In practice these times shouldalso be considered, which will complicate the exact calculations but notchange any of the following conclusions.

In the example the clock frequency is chosen to 25 MHz, corresponding toa clock period of 40 ns, which is well below the maximum clock frequencyof the circuit. At the time t=0 all four registers are clocked, andtheir output signals are ready. If we look at the combinational logicblock connecting the output of FF_(a) to the input of FF_(b),D_(min[a,b]) is 1 ns and D_(max[a,b]) is 5 ns, which means that theinput of FF_(b) may change already after 1 ns but it may also take up to5 ns before it is ready. This is illustrated by the shaded area in theupper part of FIG. 4. Similarly, the signal from FF_(b) to FF_(c) willarrive between 1 and 3 ns, the signal from FF_(c) to FF_(d) also between1 and 3 ns, and the signal from FF_(b) to FF_(d) (i.e. the direct route)between 3 and 20 ns, which is also illustrated with shaded areas. It isseen that after 20 ns (i.e. D_(max[b,d])) all input signals to theregisters are ready for the next clock pulse to arrive.

Thus the clock period (T) in FIG. 4 can be reduced from the 40 ns to anyvalue down to 20 ns. Values below 20 ns are not possible because FF_(d)should not be clocked before its input signal is guaranteed to be ready.The situation with the clock period reduced to 20 ns, i.e. the clockfrequency increased to 50 MHz, is shown in FIG. 5. It is supposed thatD_(max[b,d]) cannot be reduced below the 20 ns, and thus 50 MHz is thehighest obtainable clock frequency when zero-skew is used.

However, intentional clock skew allows the clock frequency to beincreased further. Intentional clock skew means that the registers areallowed to be clocked at different times, i.e. the registers will havedifferent δ values. Registers FF_(a) and FF_(d) will normally have to beclocked simultaneously because they represent the input and the outputof the entire circuit, but it is seen from FIG. 5 that the registersFF_(b) and FF_(c) may be clocked earlier, because the data at theirinputs have been ready for 15 and 17 ns, respectively, prior to thearrival of the clock pulse. Especially, if FF_(b) is clocked earlier,the data at the input of FF_(d) would be ready earlier, and FF_(d) couldthus be clocked earlier with a reduction of the clock period being theresult.

Although FF_(a) and FF_(d) are normally clocked simultaneously, asmentioned, it is noted that this is not a necessary condition for thefollowing considerations.

The basic requirements for clock scheduling for the circuit to functioncorrectly can be formulated in the following expressions for all valuesof i, j where there is a combinational path from the output of registeri to the input of register j, and where T is the clock cycle time:δ_(i)−δ_(j) ≦T−D _(max[i,j])  (1)δ_(i)−δ_(j) ≧−D _(min[i,j])  (2)

According to (1) FF_(i) may be clocked later than FF_(j) (positiveskew), but not more than T−D_(max[i,j]), because then the data would notreach FF_(j) before the next clock signal. According to (2) FF_(i) maybe clocked before FF_(j) (negative skew), but not more thanD_(min[i,j]), because then the data would reach FF_(j) before it isclocked, i.e. a race condition would occur.

The requirement (1) can be used to calculate the smallest usable T.Since (1) must be true for any values of i, j, i.e. for any path fromthe output of one register to the input of another, it also must be truefor combined paths. Thus Σ(δ_(i)−δ_(j))≦Σ(T−D_(max[i,j])) for anycombined path.

As an example, the circuit of FIG. 3 has two combined paths from itsinput to its output, i.e. the paths a-b-d (which will be used in thefollowing to denote the path from register FF_(a) through registerFF_(b) to register FF_(d)) and a-b-c-d, and thus(δ_(a)−δ_(b))+(δ_(b)−δ_(d))≦(T−D _(max[a,b]))+(T−D _(max[b,d]))and(δ_(a)−δ_(b))+(δ_(b)−δ_(c))+(δ_(c)−δ_(d))≦(T−D _(max[a,b]))+(T−D_(max[b,c]))+(T−D _(max[c,d]))

Since δ_(a) is supposed to be equal to δ_(d), as mentioned above, theseexpressions can be rewritten to:δ_(a)−δ_(d)=0≦2T−(D _(max[a,b]) +D _(max[b,d]))andδ_(a)−δ_(d)=0≦3T−(D _(max[a,b]))+D _(max[b,c]) +D _(max[c,d]))orT≧(D _(max[a,b]) +D _(max[b,d]))/2andT≧(D _(max[a,b]))+D _(max[b,c]) +D _(max[c,d]))/3

Generally, the expressionT≧Σ(D _(max[i,j]))/n  (3)must be true for any combination of paths, where n is the number ofpaths in the combined path.

This means that the clock period must be selected higher than the meanvalue of D_(max) for a loop or a path from input to output of thecircuit, and since this must be true for any such path, T must begreater than the mean value of D_(max) for the loop/path with thehighest mean value of the D_(max) values. For the circuit of FIG. 3 thismeans that T≧(5 ns+20 ns)/2=12.5 ns. Thus the smallest obtainable valueof T can be calculated for any circuit of the above-mentioned type fromthese expressions.

The idea behind intentional clock skew is that the combinational blockhaving the longest D_(max) in the loop or path with the longest totalsum of the D_(max) values can “borrow” some of the time not utilized(so-called slack) by the other blocks of that loop/path, as long as theabove requirement (3) is fulfilled.

Therefore, ideally it should be possible to reduce the clock period ofthe circuit of FIG. 3 to 12.5 ns. However, it is easily seen from FIG. 5that this would cause a race condition to occur because the signal goingfrom FF_(b) via FF_(c) to FF_(d) would arrive at FF_(d) too early.Intuitively it can be seen that in order to avoid this race conditionthe intentional clock skew must be limited to the values which are shownin FIGS. 6 and 7, because the clock frequency is actually also limitedby the shortest combinational delays, not only the longest. The clockperiod can only be reduced to 18 ns corresponding to a clock frequencyof 55.6 MHz. Although a clock period of 18 ns is better, than theoriginal 20 ns, it is still far from the ideal value of 12.5 ns.

It will be seen from FIG. 7 that the problem is not that thecombinational logic connecting FF_(b) to FF_(d) has a long D_(max), butrather the big difference between D_(max) and D_(min) (including theparallel route via FF_(c)), because a clock period below, thisdifference is not possible when race conditions are to be avoided. Thiscan also be seen from the requirements (1) and (2). When requirements(1) and (2) are combined, it is found for any values of i, j that:−D _(min[i,j]) ≦T−D _(max[i,j]), or T≧D _(max[i,j]) −D _(min[i,j]).  (4)

Thus if the ideal lowest value of T calculated above violates (4), thelowest value of T will instead be limited by this expression. This isalso called the “stiffness” of the circuit. Since also this expressionmust be true for any path of the circuit, the lowest usable clock periodcan be found by calculating the difference D_(max)−D_(min) for eachcombinational block in the circuit. In case of parallel and/orsequential routes (like FF_(b)-FF_(d) and FF_(b)-FF_(c)-FF_(d) in theexample of FIGS. 3 and 6) the sum of the D_(max) values and the sum ofthe D_(min) values for each route are calculated, and then−ΣD_(min[i,j])≦(n_(max)·T)−ΣD_(max[i,j]) must be true for any of theparallel routes, where n_(max) is the number of sequential paths, i.e.the number of clock periods in the route for which ΣD_(max[i,j])calculated. The limiting value is then calculated as the highestΣD_(max) minus the lowest ΣD_(min) divided by the number of clockperiods (n_(max)) in the route with the highest ΣD_(max). Thus the clockperiod which can be obtained by intentional clock skewing is limited bythe formula

$\begin{matrix}{T \geq {\frac{{\max\left\lbrack {\sum D_{\max}} \right\rbrack} - {\min\left\lbrack {\sum D_{\min}} \right\rbrack}}{n_{\max}}.}} & (5)\end{matrix}$

In the table of FIG. 8 the difference values (Diff) according to (5) hasbeen calculated for each combinational block in the circuit (Comb), andas the highest difference value is 18 ns, this will also be the limitfor T in good correspondence with FIG. 7.

It follows from the above that if the clock period should be reducedfurther, the mentioned difference values also need to be reduced. It issupposed that the D_(max) values cannot be reduced, or they are supposedto be reduced already as much as they can. However, according to theinvention it will often be possible to increase the D_(min) valueswithout increasing the D_(max) values, and that has the desired effectof reducing the difference values. FIG. 9 shows an example of how thiscan be done with the circuit from FIG. 2. In FIG. 9 the circuit has beenmodified by the insertion of two buffers 24 and 25 between the input ofthe circuit and the AND gate 22. When these buffers have a delay of 1 nssimilar to the other gates, it is seen that D_(min) for the circuit isincreased from 1 ns to 3 ns while D_(max) is unchanged 4 ns. Thus thedifference value for the circuit has been reduced from 3 ns to 1 ns.

It can be seen from the table of FIG. 8 that in order to obtain theideal value of the clock period of 12.5 ns corresponding to a clockfrequency of 80 MHz defined by the longest delays as mentioned above,the D_(min) value for the path b-c-d must be extended to 7.5 ns. Toavoid path b-d from becoming the new restriction, its D_(min) value mustbe extended to 7.5 ns as well. Since D_(max) must be greater than orequal to D_(min) for any path, D_(max[b,c]) must also be extended, butthat can be done without any influence on the result, because the valueis small compared to the longest delays. The result is shown in thetable of FIG. 10. The corresponding circuit and timing diagram are shownin FIGS. 11 and 12.

It will also be seen from FIG. 12 why the clock period cannot be reducedto values below 12.5 ns when the registers FF_(a) and FF_(d) have to beclocked simultaneously, as will normally be a requirement fromsurrounding circuitry. D_(max[a,b])+D_(max[b,d]) must be less than twoclock periods, or in other words the clock period cannot be less thanthe mean value of D_(max) for the path having the longest total delay,as has been mentioned earlier.

If, however, this requirement does not exist, the clock period can bereduced further. An example is illustrated in FIGS. 13, 14 and 15 inwhich the clock period has been reduced to 8 ns corresponding to a clockfrequency of 125 MHz. However it must be noted that in this example theclock skew exceeds the clock period, and this will only be possible whenno external circuits require synchronism between the input and theoutput.

To illustrate the calculation of the obtainable clock periods, anotherexample will be briefly described. FIG. 16 shows a circuit similar tothat of FIG. 3, but now the longest delay is located between FF_(b) andFF_(c). With zero-skew clocking the shortest clock period is again 20 nsbecause D_(max[b,c])=20 ns, and the timing is shown in FIG. 17.

First the optimal clock period according to (3) is calculated. If it isagain supposed that FF_(a) and FF_(d) must be clocked simultaneously,the path with the longest delay is a-b-c-d, and the mean value ofD_(max) for this path is (7+20+3)/3=10 ns, and thus the clock periodcannot be reduced below this value.

The difference values according to (5) are calculated in the table ofFIG. 18. Here it is especially noted that the value for the two parallelpaths from FF_(b) to FF_(d) is calculated as ((20+3)−1)/2=11 nsaccording to formula (5) above, because the highest sum of the D_(max)values has two components. The highest difference value is 17 ns for thepath b-c, and thus with the conventional intentional skew scheme theclock period can be reduced to 17 ns as shown in the timing diagram ofFIG. 19.

In order to reduce the clock period further according to the invention,the highest difference value of 17 ns for the path b-c must be reducedto 10 ns, and therefore D_(min[b,c]) is increased to 10 ns. The resultis shown in FIGS. 20 and 21, from which it will be seen that now thedifference value for the path b-d/b-c-d (11 ns) is the limiting factor.Consequently, also this difference value has to be reduced, andD_(min[b,d]) is therefore increased to 3 ns. The final result is shownin the table of FIG. 22 and the corresponding circuit in FIG. 23. Thetiming is illustrated in FIG. 24.

In the above examples it has been described how much the clock periodcan be reduced. However, it should be mentioned that the idea of theinvention is to reduce the clock period, and thus increase the clockfrequency, but not necessarily as much as possible. In the example justmentioned above, the clock period could be reduced from 17 ns to 10 ns.If, for example, a clock period of 15 ns is needed, a good and safesolution could be to extend D_(min[b,c]) from 3 ns to 6 ns, which wouldallow a clock period of 14 ns, thus providing one extra nanosecond as asafety margin.

As illustrated in FIG. 9, one way of increasing the shortest delaybetween two registers is to insert one or more cascaded bufferssomewhere in the combinational path between the two registers. However,several other possibilities exist, and some of them are:

-   -   cascaded buffers at the output of sending register    -   cascaded buffers at the input of receiving register    -   resizing and rearranging combinatorial gates    -   latch immediately downstream of the output of the sending        register.    -   latch immediately upstream of the input of the receiving        register    -   replacing sending register with one with built-in second slave        stage, i.e. sending out on opposite edge compared to the        receiver sampling and its own input    -   replacing receiving register with one with built-in second        master stage, i.e. sampling on opposite edge compared to the        sender and its own output    -   a combination of any of the above.

It should be noted that the invention as described above can be used inthe design of a circuit from the beginning, or it can be used to improvean existing circuit. Thus a circuit can be designed by using theexisting methods of clock skew scheduling while ignoring the expression(5) in order to obtain an optimal schedule. Then afterwards those of theshortest delays showing a race condition can be increased according tothe invention.

Although a preferred embodiment of the present invention has beendescribed and shown, the invention is not restricted to it, but may alsobe embodied in other ways within the scope of the subject-matter definedin the following claims.

1. A method of modifying a design of a synchronous digital circuitcomprising a plurality of clocked storage devices and a plurality ofcombinational logic elements defining combinational paths between atleast some of the clocked storage devices, each combinational path froman output of a first one of the clocked storage devices to an input of asecond one of the clocked storage devices having a minimum delay valueand a maximum delay value, such that the actual delay of the pathassumes a value between the minimum delay value and the maximum delayvalue, the method comprising the steps of: identifying among thecombinational paths a combinational path having a greatest differencebetween the maximum delay value and the minimum delay value, andreducing the difference between the maximum delay value and the minimumdelay value by increasing the minimum delay value for the combinationalpath having the greatest difference, wherein the greatest difference incase of parallel paths is calculated as a difference between the highestmaximum delay value and the lowest minimum delay value of the parallelpaths, the maximum delay value for a sequential path is calculated as asum of the maximum delay values for the paths comprised in thesequential path, and the minimum delay value for a sequential path iscalculated as a sum of the minimum delay values for the paths comprisedin the sequential path.
 2. A method of claim 1, wherein the step ofincreasing the minimum delay value for a combinational path is performedby inserting a plurality of buffers in the combinational path.
 3. Amethod of modifying a design of a synchronous digital circuit comprisinga plurality of clocked storage devices and a plurality of combinationallogic elements defining combinational paths between at least some of theclocked storage devices, each combinational path from an output of afirst one of the clocked storage devices to an input of a second one ofthe clocked storage devices having a minimum delay value and a maximumdelay value, such that the actual delay of the path assumes a valuebetween the minimum delay value and the maximum delay value, the methodcomprising the steps of: identifying among the combinational paths acombinational path having a greatest difference between the maximumdelay value and the minimum delay value, reducing the difference betweenthe maximum delay value and the minimum delay value by increasing theminimum delay value for the combinational path having the greatestdifference, identifying among sequential paths from an input to anoutput of the circuit and sequential paths defining loops in the circuita sequential path having a highest mean value of the maximum delayvalues, calculating the highest mean value of the maximum delay values,identifying those paths for which a difference between the maximum delayvalue and the minimum delay value exceeds the highest mean value of themaximum delay values, and reducing the differences exceeding the highestmean value of the maximum delay values to be less than or equal to thehighest mean value of the maximum delay values.
 4. A system formodifying the design of a synchronous digital circuit comprising aplurality of clocked storage devices and a plurality of combinationallogic elements defining combinational paths between at least some of theclocked storage devices, each combinational path from an output of afirst one of the clocked storage devices to an input of a second one ofthe clocked storage devices having a minimum delay value and a maximumdelay value, such that an actual delay of the path assumes a valuebetween the minimum delay value and the maximum delay value, the systemcomprising: means for identifying among the combinational paths acombinational path having a greatest difference between the maximumdelay value and the minimum delay value, means for reducing thedifference between the maximum delay value and the minimum delay valueby increasing the minimum delay value for the combinational path havingthe greatest difference, wherein the system is adapted to calculate thegreatest difference in case of parallel paths as the difference betweenthe highest maximum delay value and the lowest minimum delay value, tocalculate the maximum delay value for a sequential path as the sum ofthe maximum delay values for the paths comprised in the sequential path,and to calculate the minimum delay value for a sequential path as thesum of the minimum delay values for the paths comprised in thesequential path.
 5. A system of claim 4, wherein the system is adaptedto increase the minimum delay value for a combinational path by theinsertion of a plurality of buffers in the combinational path.
 6. Asystem for modifyig the design of a synchronous digital circuitcomprising a plurality of clocked storage devices and a plurality ofcombinational logic elements defining combinational paths between atleast some of the clocked storage devices, each combinational path froman output of a first one of the clocked storage devices to an input of asecond one of the clocked storage devices having a minimum delay valueand a maximum delay value, such that an actual delay of the path assumesa value between the minimum delay value and the maximum delay value, thesystem comprising: means for identifying among the combinational paths acombinational path having a greatest difference between the maximumdelay value and the minimum delay value, means for reducing thedifference between the maximum delay value and the minimum delay valueby increasing the minimum delay value for the combinational path havingthe greatest difference, means for identifying among sequential pathsfrom an input to an output of the circuit and sequential paths definingloops in the circuit the sequential path having the highest mean valueof the maximum delay values, means for calculating the highest meanvalue of the maximum delay values, means for identifying those paths forwhich a difference between the maximum delay value and the minimum delayvalue exceeds the highest mean value of the maximum delay values, andmeans for reducing the differences exceeding the highest mean value ofthe maximum delay values to be less than or equal to the highest meanvalue of the maximum delay values.
 7. A machine readable mediumcomprising instructions for causing a processing unit to modify a designof a synchronous digital circuit including a plurality of clockedstorage devices and a plurality of combinational logic elements definingcombinational paths between at least some of the clocked storagedevices, each combinational path from an output of a first one of theclocked storage devices to an input of a second one of the clockedstorage devices having a minimum delay value and a maximum delay valuesuch that the actual delay of the path assumes a value between theminimum delay value and the maximum delay value, wherein theinstructions cause the processing unit to perform: identifying thecombinational path having the greatest difference between the maximumdelay value and the minimum delay value; and reducing the differencebetween the maximum delay value and the minimum delay value byincreasing the minimum delay value for the combinational oath having thelargest difference; wherein the medium comprises instructions forcausing the processing unit to: calculate the greatest difference incase of parallel paths as a difference between a highest maximum delayvalue and a lowest minimum delay value; calculate a maximum delay valuefor a sequential path as a sum of maximum delay values for pathscomprised in the sequential path; and calculate a minimum delay valuefor the sequential path as a sum of minimum delay values for pathscomprised in the sequential path.
 8. The medium of claim 7, wherein themedium comprises instructions for causing the processing unit toincrease a minimum delay value for a combinational path by inserting aplurality of buffers in the combinational path.
 9. A machine readablemedium comprising instructions for causing a processing unit to modify adesign of a synchronous digital circuit including a plurality of clockedstorage devices and a plurality of combinational logic elements definingcombinational paths between at least some of the clocked storagedevices, each combinational path from an output of a first one of theclocked storage devices to an input of a second one of the clockedstorage devices having a minimum delay value and a maximum delay valuesuch that the actual delay of the path assumes a value between theminimum delay value and the maximum delay value, wherein theinstructions cause the processing unit to perform: identifying thecombinational path having the greatest difference between the maximumdelay value and the minimum delay value; and reducing the differencebetween the maximum delay value and the minimum delay value byincreasing the minimum delay value for the combinational path having thelargest difference; wherein the medium comprises instructions forcausing the processing unit to perform: identifying among sequentialpaths from an input to an output of the circuit and sequential pathsdefining loops in the circuit the sequential path having the highestmean value of the maximum delay values, calculating the highest meanvalue of the maximum delay values, identifying those paths for which adifference between a maximum delay value of the path and a minimum delayvalue of the path exceeds the highest mean value of the maximum delayvalues, and reducing the differences exceeding the highest mean value ofthe maximum delay values to be less than or equal to the highest meanvalue of the maximum delay values.